Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 17.743
Filtrar
1.
PLoS One ; 19(4): e0300842, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38598429

RESUMO

Maze-solving is a classical mathematical task, and is recently analogously achieved using various eccentric media and devices, such as living tissues, chemotaxis, and memristors. Plasma generated in a labyrinth of narrow channels can also play a role as a route finder to the exit. In this study, we experimentally observe the function of maze-route findings in a plasma system based on a mixed discharge scheme of direct-current (DC) volume mode and alternative-current (AC) surface dielectric-barrier discharge, and computationally generalize this function in a reinforcement-learning model. In our plasma system, we install two electrodes at the entry and the exit in a square lattice configuration of narrow channels whose cross section is 1×1 mm2 with the total length around ten centimeters. Visible emissions in low-pressure Ar gas are observed after plasma ignition, and the plasma starting from a given entry location reaches the exit as the discharge voltage increases, whose route converging level is quantified by Shannon entropy. A similar short-path route is reproduced in a reinforcement-learning model in which electric potentials through the discharge voltage is replaced by rewards with positive and negative sign or polarity. The model is not rigorous numerical representation of plasma simulation, but it shares common points with the experiments along with a rough sketch of underlying processes (charges in experiments and rewards in modelling). This finding indicates that a plasma-channel network works in an analog computing function similar to a reinforcement-learning algorithm slightly modified in this study.


Assuntos
Líquidos Corporais , Reforço Psicológico , Recompensa , Plasma , Algoritmos
2.
Elife ; 122024 Apr 02.
Artigo em Inglês | MEDLINE | ID: mdl-38562050

RESUMO

In the unpredictable Anthropocene, a particularly pressing open question is how certain species invade urban environments. Sex-biased dispersal and learning arguably influence movement ecology, but their joint influence remains unexplored empirically, and might vary by space and time. We assayed reinforcement learning in wild-caught, temporarily captive core-, middle-, or edge-range great-tailed grackles-a bird species undergoing urban-tracking rapid range expansion, led by dispersing males. We show, across populations, both sexes initially perform similarly when learning stimulus-reward pairings, but, when reward contingencies reverse, male-versus female-grackles finish 'relearning' faster, making fewer choice-option switches. How do male grackles do this? Bayesian cognitive modelling revealed male grackles' choice behaviour is governed more strongly by the 'weight' of relative differences in recent foraging payoffs-i.e., they show more pronounced risk-sensitive learning. Confirming this mechanism, agent-based forward simulations of reinforcement learning-where we simulate 'birds' based on empirical estimates of our grackles' reinforcement learning-replicate our sex-difference behavioural data. Finally, evolutionary modelling revealed natural selection should favour risk-sensitive learning in hypothesised urban-like environments: stable but stochastic settings. Together, these results imply risk-sensitive learning is a winning strategy for urban-invasion leaders, underscoring the potential for life history and cognition to shape invasion success in human-modified environments.


Assuntos
Aprendizagem , Passeriformes , Animais , Humanos , Feminino , Masculino , Teorema de Bayes , Cognição , Reforço Psicológico
3.
Accid Anal Prev ; 201: 107570, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38614052

RESUMO

To improve the traffic safety and efficiency of freeway tunnels, this study proposes a novel variable speed limit (VSL) control strategy based on the model-based reinforcement learning framework (MBRL) with safety perception. The MBRL framework is designed by developing a multi-lane cell transmission model for freeway tunnels as an environment model, which is built so that agents can interact with the environment model while interacting with the real environment to improve the sampling efficiency of reinforcement learning. Based on a real-time crash risk prediction model for freeway tunnels that uses random deep and cross networks, the safety perception function inside the MBRL framework is developed. The reinforcement learning components fully account for most current tunnels' application conditions, and the VSL control agent is trained using a deep dyna-Q method. The control process uses a safety trigger mechanism to reduce the likelihood of crashes caused by frequent changes in speed. The efficacy of the proposed VSL strategies is validated through simulation experiments. The results show that the proposed VSL strategies significantly increase traffic safety performance by between 16.00% and 20.00% and traffic efficiency by between 3.00% and 6.50% compared to a fixed speed limit approach. Notably, the proposed strategies outperform traditional VSL strategy based on the traffic flow prediction model in terms of traffic safety and efficiency improvement, and they also outperform the VSL strategy based on model-free reinforcement learning framework when sampling efficiency is considered together. In addition, the proposed strategies with safety triggers are safer than those without safety triggers. These findings demonstrate the potential for MBRL-based VSL strategies to improve traffic safety and efficiency within freeway tunnels.


Assuntos
Acidentes de Trânsito , Condução de Veículo , Reforço Psicológico , Segurança , Acidentes de Trânsito/prevenção & controle , Humanos , Condução de Veículo/psicologia , Planejamento Ambiental , Simulação por Computador , Modelos Teóricos
4.
Proc Natl Acad Sci U S A ; 121(15): e2317618121, 2024 Apr 09.
Artigo em Inglês | MEDLINE | ID: mdl-38557193

RESUMO

Throughout evolution, bacteria and other microorganisms have learned efficient foraging strategies that exploit characteristic properties of their unknown environment. While much research has been devoted to the exploration of statistical models describing the dynamics of foraging bacteria and other (micro-) organisms, little is known, regarding the question of how good the learned strategies actually are. This knowledge gap is largely caused by the absence of methods allowing to systematically develop alternative foraging strategies to compare with. In the present work, we use deep reinforcement learning to show that a smart run-and-tumble agent, which strives to find nutrients for its survival, learns motion patterns that are remarkably similar to the trajectories of chemotactic bacteria. Strikingly, despite this similarity, we also find interesting differences between the learned tumble rate distribution and the one that is commonly assumed for the run and tumble model. We find that these differences equip the agent with significant advantages regarding its foraging and survival capabilities. Our results uncover a generic route to use deep reinforcement learning for discovering search and collection strategies that exploit characteristic but initially unknown features of the environment. These results can be used, e.g., to program future microswimmers, nanorobots, and smart active particles for tasks like searching for cancer cells, micro-waste collection, or environmental remediation.


Assuntos
Aprendizagem , Reforço Psicológico , Modelos Estatísticos , Movimento (Física) , Bactérias
5.
Sci Robot ; 9(89): eadi9579, 2024 Apr 17.
Artigo em Inglês | MEDLINE | ID: mdl-38630806

RESUMO

Humanoid robots that can autonomously operate in diverse environments have the potential to help address labor shortages in factories, assist elderly at home, and colonize new planets. Although classical controllers for humanoid robots have shown impressive results in a number of settings, they are challenging to generalize and adapt to new environments. Here, we present a fully learning-based approach for real-world humanoid locomotion. Our controller is a causal transformer that takes the history of proprioceptive observations and actions as input and predicts the next action. We hypothesized that the observation-action history contains useful information about the world that a powerful transformer model can use to adapt its behavior in context, without updating its weights. We trained our model with large-scale model-free reinforcement learning on an ensemble of randomized environments in simulation and deployed it to the real-world zero-shot. Our controller could walk over various outdoor terrains, was robust to external disturbances, and could adapt in context.


Assuntos
Robótica , Humanos , Idoso , Robótica/métodos , Locomoção , Caminhada , Aprendizagem , Reforço Psicológico
6.
Proc Natl Acad Sci U S A ; 121(16): e2303165121, 2024 Apr 16.
Artigo em Inglês | MEDLINE | ID: mdl-38607932

RESUMO

Antimicrobial resistance was estimated to be associated with 4.95 million deaths worldwide in 2019. It is possible to frame the antimicrobial resistance problem as a feedback-control problem. If we could optimize this feedback-control problem and translate our findings to the clinic, we could slow, prevent, or reverse the development of high-level drug resistance. Prior work on this topic has relied on systems where the exact dynamics and parameters were known a priori. In this study, we extend this work using a reinforcement learning (RL) approach capable of learning effective drug cycling policies in a system defined by empirically measured fitness landscapes. Crucially, we show that it is possible to learn effective drug cycling policies despite the problems of noisy, limited, or delayed measurement. Given access to a panel of 15 [Formula: see text]-lactam antibiotics with which to treat the simulated Escherichia coli population, we demonstrate that RL agents outperform two naive treatment paradigms at minimizing the population fitness over time. We also show that RL agents approach the performance of the optimal drug cycling policy. Even when stochastic noise is introduced to the measurements of population fitness, we show that RL agents are capable of maintaining evolving populations at lower growth rates compared to controls. We further tested our approach in arbitrary fitness landscapes of up to 1,024 genotypes. We show that minimization of population fitness using drug cycles is not limited by increasing genome size. Our work represents a proof-of-concept for using AI to control complex evolutionary processes.


Assuntos
Anti-Infecciosos , Aprendizagem , Reforço Psicológico , Resistência Microbiana a Medicamentos , Ciclismo , Escherichia coli/genética
7.
Neuropsychologia ; 197: 108847, 2024 May 03.
Artigo em Inglês | MEDLINE | ID: mdl-38460774

RESUMO

Methamphetamine use disorder (MUD) as a major public health risk is associated with dysfunctional neural feedback processing. Although dysfunctional feedback processing in people who are substance dependent has been explored in several behavioral, computational, and electrocortical studies, this mechanism in MUDs requires to be well understood. Furthermore, the current understanding of latent components of their behavior such as learning speed and exploration-exploitation dilemma is still limited. In addition, the association between the latent cognitive components and the related neural mechanisms also needs to be explored. Therefore, in this study, the underlying neurocognitive mechanisms of feedback processing of such impairment, and age/gender-matched healthy controls are evaluated within a probabilistic learning task with rewards and punishments. Mathematical modeling results based on the Q-learning paradigm suggested that MUDs show less sensitivity in distinguishing optimal options. Additionally, it may be worth noting that MUDs exhibited a slight decrease in their ability to learn from negative feedback compared to healthy controls. Also through the lens of underlying neural mechanisms, MUDs showed lower theta power at the medial-frontal areas while responding to negative feedback. However, other EEG measures of reinforcement learning including feedback-related negativity, parietal-P300, and activity flow from the medial frontal to lateral prefrontal regions, remained intact in MUDs. On the other hand, the elimination of the linkage between value sensitivity and medial-frontal theta activity in MUDs was observed. The observed dysfunction could be due to the adverse effects of methamphetamine on the cortico-striatal dopamine circuit, which is reflected in the anterior cingulate cortex activity as the most likely region responsible for efficient behavior adjustment. These findings could help us to pave the way toward tailored therapeutic approaches.


Assuntos
Eletroencefalografia , Metanfetamina , Humanos , Masculino , Eletroencefalografia/métodos , Metanfetamina/efeitos adversos , Retroalimentação , Reforço Psicológico , Recompensa
8.
PLoS Comput Biol ; 20(3): e1011950, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38552190

RESUMO

Active reinforcement learning enables dynamic prediction and control, where one should not only maximize rewards but also minimize costs such as of inference, decisions, actions, and time. For an embodied agent such as a human, decisions are also shaped by physical aspects of actions. Beyond the effects of reward outcomes on learning processes, to what extent can modeling of behavior in a reinforcement-learning task be complicated by other sources of variance in sequential action choices? What of the effects of action bias (for actions per se) and action hysteresis determined by the history of actions chosen previously? The present study addressed these questions with incremental assembly of models for the sequential choice data from a task with hierarchical structure for additional complexity in learning. With systematic comparison and falsification of computational models, human choices were tested for signatures of parallel modules representing not only an enhanced form of generalized reinforcement learning but also action bias and hysteresis. We found evidence for substantial differences in bias and hysteresis across participants-even comparable in magnitude to the individual differences in learning. Individuals who did not learn well revealed the greatest biases, but those who did learn accurately were also significantly biased. The direction of hysteresis varied among individuals as repetition or, more commonly, alternation biases persisting from multiple previous actions. Considering that these actions were button presses with trivial motor demands, the idiosyncratic forces biasing sequences of action choices were robust enough to suggest ubiquity across individuals and across tasks requiring various actions. In light of how bias and hysteresis function as a heuristic for efficient control that adapts to uncertainty or low motivation by minimizing the cost of effort, these phenomena broaden the consilient theory of a mixture of experts to encompass a mixture of expert and nonexpert controllers of behavior.


Assuntos
Aprendizagem , Reforço Psicológico , Humanos , Recompensa , Aprendizagem Baseada em Problemas , Viés
9.
Artif Intell Med ; 150: 102811, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38553154

RESUMO

Sepsis is the third leading cause of death worldwide. Antibiotics are an important component in the treatment of sepsis. The use of antibiotics is currently facing the challenge of increasing antibiotic resistance (Evans et al., 2021). Sepsis medication prediction can be modeled as a Markov decision process, but existing methods fail to integrate with medical knowledge, making the decision process potentially deviate from medical common sense and leading to underperformance. (Wang et al., 2021). In this paper, we use Deep Q-Network (DQN) to construct a Sepsis Anti-infection DQN (SAI-DQN) model to address the challenge of determining the optimal combination and duration of antibiotics in sepsis treatment. By setting sepsis clinical knowledge as reward functions to guide DQN complying with medical guidelines, we formed personalized treatment recommendations for antibiotic combinations. The results showed that our model had a higher average value for decision-making than clinical decisions. For the test set of patients, our model predicts that 79.07% of patients will achieve a favorable prognosis with the recommended combination of antibiotics. By statistically analyzing decision trajectories and drug action selection, our model was able to provide reasonable medication recommendations that comply with clinical practices. Our model was able to improve patient outcomes by recommending appropriate antibiotic combinations in line with certain clinical knowledge.


Assuntos
Antibacterianos , Sepse , Humanos , Antibacterianos/uso terapêutico , Sepse/diagnóstico , Sepse/tratamento farmacológico , Prognóstico , Reforço Psicológico
10.
J Appl Behav Anal ; 57(2): 455-462, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38438320

RESUMO

Functional communication training (FCT) is an evidence-based treatment for behavior targeted for reduction that often combines extinction for target responses and arranges functionally equivalent reinforcement for alternative behavior. Long-term effectiveness of FCT can become compromised when transitioning from clinic to nonclinic contexts or thinning reinforcement schedules for appropriate behavior. Such increases in targeted behavior have been conceptualized as renewal and resurgence, respectively. The relation between resurgence and renewal has yet to be reported. Therefore, the present report retrospectively analyzed the relation between renewal and resurgence in data collected when implementing FCT with children diagnosed with developmental disabilities. We found no relation when evaluating all 34 individuals assessed for resurgence and renewal or a subset of individuals exhibiting both resurgence and renewal. These findings suggest that one form of relapse may not be predictive of another form of relapse.


Assuntos
Terapia Comportamental , Extinção Psicológica , Criança , Humanos , Estudos Retrospectivos , Extinção Psicológica/fisiologia , Reforço Psicológico , Recidiva , Esquema de Reforço , Condicionamento Operante/fisiologia
11.
J Appl Behav Anal ; 57(2): 426-443, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38438321

RESUMO

The functional analysis approach described by Iwata et al. (1982/1994) has been used widely to determine the variables evoking and maintaining challenging behavior. However, one potential concern with conducting functional analyses is that repeated exposure to contingencies may induce a novel functional relation. To examine the likelihood of these potential iatrogenic effects, we evaluated social test conditions of the functional analysis for 116 participants and searched for patterns of responding indicative of acquisition. Patterns suggestive of acquisition occurred in 13.70% of tangible reinforcement conditions; however, the prevalence was only slightly lower in the attention condition (8.75%). Much lower prevalence was observed for the escape condition (2.13%). When grouped by quotient score, a pattern of acquisition was just as likely to be observed in the attention condition as in the tangible condition. Additionally, patterns indicative of acquisition were no more likely to be observed with participants who emitted automatically reinforced challenging behavior.


Assuntos
Transtornos do Comportamento Infantil , Reforço Psicológico , Humanos , Criança , Terapia Comportamental , Atenção , Probabilidade
12.
Learn Mem ; 31(3)2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38527752

RESUMO

From early in life, we encounter both controllable environments, in which our actions can causally influence the reward outcomes we experience, and uncontrollable environments, in which they cannot. Environmental controllability is theoretically proposed to organize our behavior. In controllable contexts, we can learn to proactively select instrumental actions that bring about desired outcomes. In uncontrollable environments, Pavlovian learning enables hard-wired, reflexive reactions to anticipated, motivationally salient events, providing "default" behavioral responses. Previous studies characterizing the balance between Pavlovian and instrumental learning systems across development have yielded divergent findings, with some studies observing heightened expression of Pavlovian learning during adolescence and others observing a reduced influence of Pavlovian learning during this developmental stage. In this study, we aimed to investigate whether a theoretical model of controllability-dependent arbitration between learning systems might explain these seemingly divergent findings in the developmental literature, with the specific hypothesis that adolescents' action selection might be particularly sensitive to environmental controllability. To test this hypothesis, 90 participants, aged 8-27, performed a probabilistic-learning task that enables estimation of Pavlovian influence on instrumental learning, across both controllable and uncontrollable conditions. We fit participants' data with a reinforcement-learning model in which controllability inferences adaptively modulate the dominance of Pavlovian versus instrumental control. Relative to children and adults, adolescents exhibited greater flexibility in calibrating the expression of Pavlovian bias to the degree of environmental controllability. These findings suggest that sensitivity to environmental reward statistics that organize motivated behavior may be heightened during adolescence.


Assuntos
Condicionamento Clássico , Aprendizagem , Adulto , Criança , Humanos , Adolescente , Condicionamento Clássico/fisiologia , Aprendizagem/fisiologia , Reforço Psicológico , Condicionamento Operante/fisiologia , Recompensa
13.
Neural Netw ; 174: 106236, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38518710

RESUMO

In speech enhancement tasks, local and non-local attention mechanisms have been significantly improved and well studied. However, a natural speech signal contains many dynamic and fast-changing acoustic features, and focusing on one type of attention mechanism (local or non-local) cannot precisely capture the most discriminative information for estimating target speech from background interference. To address this issue, we introduce an adaptive selection network to dynamically select an appropriate route that determines whether to use the attention mechanisms and which to use for the task. We train the adaptive selection network using reinforcement learning with a developed difficulty-adjusted reward that is related to the performance, complexity, and difficulty of target speech estimation from the noisy mixtures. Consequently, we propose an Attention Selection Speech Enhancement Network (ASSENet) with the innovative dynamic block that consists of an adaptive selection network and a local and non-local attention based speech enhancement network. In particular, the ASSENet incorporates both local and non-local attention and develops the attention mechanism selection technique to explore the appropriate route of local and non-local attention mechanisms for speech enhancement tasks. The results show that our method achieves comparable and superior performance to existing approaches with attractive computational costs.


Assuntos
Aprendizagem , Fala , Reforço Psicológico , Recompensa , Ruído
14.
Neural Netw ; 174: 106243, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38531123

RESUMO

Generative Flow Networks (GFlowNets) aim to generate diverse trajectories from a distribution in which the final states of the trajectories are proportional to the reward, serving as a powerful alternative to reinforcement learning for exploratory control tasks. However, the individual-flow matching constraint in GFlowNets limits their applications for multi-agent systems, especially continuous joint-control problems. In this paper, we propose a novel Multi-Agent generative Continuous Flow Networks (MACFN) method to enable multiple agents to perform cooperative exploration for various compositional continuous objects. Technically, MACFN trains decentralized individual-flow-based policies in a centralized global-flow-based matching fashion. During centralized training, MACFN introduces a continuous flow decomposition network to deduce the flow contributions of each agent in the presence of only global rewards. Then agents can deliver actions solely based on their assigned local flow in a decentralized way, forming a joint policy distribution proportional to the rewards. To guarantee the expressiveness of continuous flow decomposition, we theoretically derive a consistency condition on the decomposition network. Experimental results demonstrate that the proposed method yields results superior to the state-of-the-art counterparts and better exploration capability. Our code is available at https://github.com/isluoshuang/MACFN.


Assuntos
Aprendizagem , Políticas , Reforço Psicológico , Recompensa
15.
Neural Netw ; 174: 106246, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38547801

RESUMO

The agent learns to organize decision behavior to achieve a behavioral goal, such as reward maximization, and reinforcement learning is often used for this optimization. Learning an optimal behavioral strategy is difficult under the uncertainty that events necessary for learning are only partially observable, called as Partially Observable Markov Decision Process (POMDP). However, the real-world environment also gives many events irrelevant to reward delivery and an optimal behavioral strategy. The conventional methods in POMDP, which attempt to infer transition rules among the entire observations, including irrelevant states, are ineffective in such an environment. Supposing Redundantly Observable Markov Decision Process (ROMDP), here we propose a method for goal-oriented reinforcement learning to efficiently learn state transition rules among reward-related "core states" from redundant observations. Starting with a small number of initial core states, our model gradually adds new core states to the transition diagram until it achieves an optimal behavioral strategy consistent with the Bellman equation. We demonstrate that the resultant inference model outperforms the conventional method for POMDP. We emphasize that our model only containing the core states has high explainability. Furthermore, the proposed method suits online learning as it suppresses memory consumption and improves learning speed.


Assuntos
Objetivos , Aprendizagem , Reforço Psicológico , Recompensa , Cadeias de Markov
16.
Physiol Behav ; 279: 114531, 2024 May 15.
Artigo em Inglês | MEDLINE | ID: mdl-38552705

RESUMO

It is well known that a large portion of the population elevates their intake of high energy-dense foods during times of stress; however, it is understudied whether stress affects the reinforcing value of a food reward. Further knowledge of this relationship may help us better understand the positive correlation between reinforcing value of food and obesity. Therefore, it was tested if an acute stressor would increase the reinforcing value of low or high energy-dense food. Participants (N = 70) were randomized to a stress or no-stress condition after which they were allowed to work to gain access to a food reward and reading time. To determine if high energy-dense food was specifically affected, half the participants from each stress manipulation were randomly assigned to work for either grapes or chocolate candies. Participants in the stress condition worked less for food access than those in the no-stress condition, for both low and high energy-dense foods, but stress did not affect the reinforcing value of reading time. These results indicate that, contrary to our hypothesis, in a sample of college students, an acute stressor decreased reinforcing value of food, with no difference between a low and high energy-dense food item.


Assuntos
Ingestão de Energia , Preferências Alimentares , Humanos , Reforço Psicológico , Comportamento Alimentar , Estudantes
17.
Proc Natl Acad Sci U S A ; 121(12): e2317751121, 2024 Mar 19.
Artigo em Inglês | MEDLINE | ID: mdl-38489382

RESUMO

Do people's attitudes toward the (a)symmetry of an outcome distribution affect their choices? Financial investors seek return distributions with frequent small returns but few large ones, consistent with leading models of choice in economics and finance that assume right-skewed preferences. In contrast, many experiments in which decision-makers learn about choice options through experience find the opposite choice tendency, in favor of left-skewed options. To reconcile these seemingly contradicting findings, the present work investigates the effect of skewness on choices in experience-based decisions. Across seven studies, we show that apparent preferences for left-skewed outcome distributions are a consequence of those distributions having a higher value in most direct outcome comparisons, a "frequent-winner effect." By manipulating which option is the frequent winner, we show that choice tendencies for frequent winners can be obtained even with identical outcome distributions. Moreover, systematic choice tendencies in favor of right- or left-skewed options can be obtained by manipulating which option is experienced as the frequent winner. We also find evidence for an intrinsic preference for right-skewed outcome distributions. The frequent-winner phenomenon is robust to variations in outcome distributions and experimental paradigms. These findings are confirmed by computational analyses in which a reinforcement-learning model capturing frequent winning and intrinsic skewness preferences provides the best account of the data. Our work reconciles conflicting findings of aggregated behavior in financial markets and experiments and highlights the need for theories of decision-making sensitive to joint outcome distributions of the available options.


Assuntos
Comportamento de Escolha , Tomada de Decisões , Humanos , Aprendizagem , Reforço Psicológico
18.
Cogn Affect Behav Neurosci ; 24(2): 384-387, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38459406

RESUMO

There is a growing focus on the computational aspects of psychiatric disorders in humans. This idea also is gaining traction in nonhuman animal studies. Commenting on a new comprehensive overview of the benefits of applying this approach in translational research by Neville et al. (Cognitive Affective & Behavioral Neuroscience 1-14, 2024), we discuss the implications for translational model validity within this framework. We argue that thinking computationally in translational psychiatry calls for a change in the way that we evaluate animal models of human psychiatric processes, with a shift in focus towards symptom-producing computations rather than the symptoms themselves. Further, in line with Neville et al.'s adoption of the reinforcement learning framework to model animal behaviour, we illustrate how this approach can be applied beyond simple decision-making paradigms to model more naturalistic behaviours.


Assuntos
Pesquisa Translacional Biomédica , Humanos , Pesquisa Translacional Biomédica/métodos , Animais , Transtornos Mentais , Psiquiatria/métodos , Psiquiatria/tendências , Pensamento/fisiologia , Reforço Psicológico , Modelos Animais de Doenças
19.
Front Public Health ; 12: 1229262, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38504677

RESUMO

Introduction: The Community Reinforcement Approach is an evidence-based treatment modality for alcohol and drug addiction treatment with proven efficacy and cost-effectiveness. The present study investigated the effectiveness of the Community Reinforcement Approach (CRA) in the context of quality of life among drug addicts. Materials and methods: A total of 60 inpatient substance abusers post detoxification in Fountain House, Lahore, Pakistan, participated in this study. Fountain House was selected as the Minnesota model is primarily used there. Therefore, a new treatment approach was introduced to investigate its effectiveness for individuals with substance abuse. A randomized 12-week trial was conducted as a substance use disorders (SUDs) treatment program. Persons with SUD (i.e., identified patients) enrolled in a residential treatment program were randomized into the integrated model of the Community Reinforcement Approach (CRA) and traditional Minnesota model treatment (n = 30), and traditional Minnesota model treatment only (TMM; n = 30). All the participants in the experimental group attended the group therapy sessions and other activities in the facility in addition to the treatment conditions. The participants attended the individual therapeutic sessions, which were conducted according to the CRA guidelines used in the experimental group. In this study, each individual in the CRA treatment group received 12 one-to-one sessions ranging from 45 min to 1 h. The WHOQOL-BREF scale and Happiness Scale (1) were used for data collection. Result: The results showed a significant increase in the quality of life of participants in the treatment group with CRA compared with the control group with TMM. The findings also indicated that the individuals in the treatment group with CRA had improved levels of happiness compared with individuals with TMM. Discussion: The CRA is an effective and adaptable treatment approach that works well in combination with other treatment approaches. The proven efficacy, compatibility, and cost-effectiveness distinguish it from other treatment methods. Implications: The CRA should be adapted, assessed, and evaluated further, especially in Pakistan, where there is a pressing need to adopt an effective treatment strategy for addiction problems.


Assuntos
Qualidade de Vida , Transtornos Relacionados ao Uso de Substâncias , Humanos , Felicidade , Reforço Psicológico , Terapia Comportamental/métodos , Transtornos Relacionados ao Uso de Substâncias/terapia
20.
Neurosci Biobehav Rev ; 160: 105623, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38490499

RESUMO

Foraging is a natural behavior that involves making sequential decisions to maximize rewards while minimizing the costs incurred when doing so. The prevalence of foraging across species suggests that a common brain computation underlies its implementation. Although anterior cingulate cortex is believed to contribute to foraging behavior, its specific role has been contentious, with predominant theories arguing either that it encodes environmental value or choice difficulty. Additionally, recent attempts to characterize foraging have taken place within the reinforcement learning framework, with increasingly complex models scaling with task complexity. Here we review reinforcement learning foraging models, highlighting the hierarchical structure of many foraging problems. We extend this literature by proposing that ACC guides foraging according to principles of model-based hierarchical reinforcement learning. This idea holds that ACC function is organized hierarchically along a rostral-caudal gradient, with rostral structures monitoring the status and completion of high-level task goals (like finding food), and midcingulate structures overseeing the execution of task options (subgoals, like harvesting fruit) and lower-level actions (such as grabbing an apple).


Assuntos
Tomada de Decisões , Giro do Cíngulo , Humanos , Animais , Reforço Psicológico , Recompensa , Comportamento Animal , Comportamento de Escolha
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...